Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[config change] Improve timeboost implementation #2860

Merged
merged 8 commits into from
Jan 9, 2025

Conversation

ganeshvanahalli
Copy link
Contributor

@ganeshvanahalli ganeshvanahalli commented Dec 31, 2024

This PR addresses review comments on the main PR, particularly this thread #2561 (review)

We add a new config option to Timeboost config

`--execution.sequencer.timeboost.max-queued-tx-count` -> maximum allowed number of express lane txs with future sequence number to be queued. Set 0 to disable this check and a negative value to prevent queuing of any future sequence number transactions.

@cla-bot cla-bot bot added the s Automatically added by the CLA bot if the creator of a PR is registered as having signed the CLA. label Dec 31, 2024
Copy link

codecov bot commented Jan 6, 2025

Codecov Report

Attention: Patch coverage is 1.20000% with 247 lines in your changes missing coverage. Please review.

Project coverage is 21.59%. Comparing base (d48436d) to head (b1aefc7).

Additional details and impacted files
@@                    Coverage Diff                     @@
##           express-lane-timeboost    #2860      +/-   ##
==========================================================
+ Coverage                   21.55%   21.59%   +0.03%     
==========================================================
  Files                         286      287       +1     
  Lines                       42546    42462      -84     
==========================================================
- Hits                         9170     9168       -2     
+ Misses                      31882    31803      -79     
+ Partials                     1494     1491       -3     

Copy link
Member

@Tristan-Wilson Tristan-Wilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes only.

type transactionPublisher interface {
PublishTimeboostedTransaction(context.Context, *types.Transaction, *arbitrum_types.ConditionalOptions, chan struct{}) error
Config() *SequencerConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to me like a break in the abstraction. Can you just pass in the config parameters you need on creation of the expressLaneService instead of fetching them this way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. The parameters we use MaxBlockSpeed and QueueTimeout are both hot reloadable, so I think passing SequencerConfigFetcher on creation should work here

execution/gethexec/express_lane_service.go Outdated Show resolved Hide resolved
maxBlocksPerRound := es.roundTimingInfo.Round / maxBlockSpeed
fromBlock = latestBlock.Number.Uint64()
// #nosec G115
if fromBlock > uint64(maxBlocksPerRound) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. On startup I think the old code would miss the most recent event, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, thats right

// Put into the sequence number map.
resultChan := make(chan error, 1)
es.msgAndResultBySequenceNumber[msg.SequenceNumber] = &msgAndResult{msg, resultChan}
roundInfo.msgAndResultBySequenceNumber[msg.SequenceNumber] = &msgAndResult{msg, resultChan}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set a small configurable max number of queued messages here? I think 10 should be enough. This mechanism is just to support out of order delivery which should be minimal. We don't want an unlimited number of messages to be queued and unlocked instantly when the missing sequence number comes in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this should mitigate such risks

@Tristan-Wilson
Copy link
Member

Here's my analysis where I looked for any concurrency issues.

We have a ticker that goes off (roughly) when the round starts. It deletes the previous round's roundControl entry, that round no longer has a controller.

We have a ticker that goes off at max block speed.

  • On AuctionResolved events it stores a new round->controller address in roundControl
  • On SetExpressLaneController (transfer) events it
    • stores a new round->controller address in roundControl, overwriting the previous
    • locks roundInfoMutex, sets a new expressLaneRoundInfo for this round, unlocks roundInfoMutex

We have txs coming in at random times.

  • They are first Validated which checks the message's sender against roundControl for the current round.
  • Then they are Sequenced which
    • locks the roundInfoMutex
    • checks msg's sender against roundControl again
    • various reads/writes of current round's roundInfo as messages are put into roundInfo queue, removed as they are queued with sequencer, sequence number updated
    • release roundInfoMutex

Considering some scenarios:
If we're in the middle of sequencing an express lane tx then round control is transferred, the round info (which contains the seq number and msg queue) is not reset until that tx is queued. I don't think this makes the unavoidable race between when we see the transfer event and when we see the txs any worse. We should make sure the max block speed ticker makes the most sense to get the events with minimal latency to minimize this issue.

The goroutine running the ELS new round ticker could be late to be scheduled and generate a new tick slightly late. This shouldn't be a problem because the deletion is only needed to avoid unbounded growth. A tx that comes in during this time (eg new round has started but tick hasn't occurred yet) would be rejected at validation due to re-deriving the current round from the system time.

If we are in the middle of handling a SetExpressLaneController event and get an express lane tx from the old controller, if the handling of SetExpressLaneController already changed the roundControl then the tx will be rejected. If not, then if SequenceExpressLaneSubmission gets the roundInfoMutex first it may be queued with the sequencer. If the SetExpressLaneController handling gets the mutex first, then when SequenceExpressLaneSubmission gets the mutex, it rechecks roundControl which would reject the tx.

So in summary I think there is inherent raciness around SetExpressLaneController (transfer) events and express lane txs, which at the moment I think is unavoidable. We may still process messages from the old controller or not process messages from the new controller around the moment of transfer for a brief time after the actual transfer occurs on chain, due to delays in reading the events and also not interrupting processing a message upon receiving an event. But aside from this, I don't think the event and tx happening concurrently can cause any invalid state.

@ganeshvanahalli ganeshvanahalli changed the title Improve timeboost implementation [config change] Improve timeboost implementation Jan 8, 2025
@Tristan-Wilson Tristan-Wilson merged commit be11793 into express-lane-timeboost Jan 9, 2025
10 checks passed
@Tristan-Wilson Tristan-Wilson deleted the improve-timeboost-code branch January 9, 2025 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
s Automatically added by the CLA bot if the creator of a PR is registered as having signed the CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants